35 research outputs found
A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems
In this paper, we present a novel cache design based on Multi-Level Cell
Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set
capacity and associativity to use efficiently the full potential of MLC STTRAM.
We exploit the asymmetric nature of the MLC storage scheme to build cache lines
featuring heterogeneous performances, that is, half of the cache lines are
read-friendly, while the other is write-friendly. Furthermore, we propose to
opportunistically deactivate ways in underutilized sets to convert MLC to
Single-Level Cell (SLC) mode, which features overall better performance and
lifetime. Our ultimate goal is to build a cache architecture that combines the
capacity advantages of MLC and performance/energy advantages of SLC. Our
experiments show an improvement of 43% in total numbers of conflict misses, 27%
in memory access latency, 12% in system performance, and 26% in LLC access
energy, with a slight degradation in cache lifetime (about 7%) compared to an
SLC cache
Affine Modeling of Program Traces
This is a post-peer-review, pre-copyedit version of an article published in IEEE Transactions on Computers. The final authenticated version is available online at: http://dx.doi.org/10.1109/TC.2018.2853747[Abstract] A formal, high-level representation of programs is typically needed for static and dynamic analyses performed by compilers. However, the source code of target applications is not always available in an analyzable form, e.g., to protect intellectual property. To reason on such applications it becomes necessary to build models from observations of its execution. This paper presents an algebraic approach which, taking as input the trace of memory addresses accessed by a single memory reference, synthesizes an affine loop with a single perfectly nested statement that generates the original trace. This approach is extended to support the synthesis of unions of affine loops, useful for minimally modeling traces generated by automatic transformations of polyhedral programs, such as tiling. The resulting system is capable of processing hundreds of gigabytes of trace data in minutes, minimally reconstructing 100 percent of the static control parts in PolyBench/C applications and 99.9 percent in the Pluto-tiled versions of these benchmarks.Ministerio de EconomÃa y Competitividad; TIN2016-75845-PNational Science Foundation (Estados Unidos); 1626251National Science Foundation (Estados Unidos); 1409095National Science Foundation (Estados Unidos); 1439057National Science Foundation (Estados Unidos); 1213052National Science Foundation (Estados Unidos); 1439021National Science Foundation (Estados Unidos); 162912
April: A Run-Time Library for Tape-Resident Data
Over the last decade, processors have made enormous gains in speed. But increase in the speed of the secondary and tertiary storage devices could not cope with these gains. The result is that the secondary and tertiary storage access times dominate execution time of data intensive computations. Therefore, in scientific computations, efficient data access functionality for data stored in secondary and tertiary storage is a must. In this paper, we give an overview of APRIL, a parallel runtime library that can be used in applications that process tape-resident data. We present user interface and underlying optimization strategy. We also discuss performance improvements provided by the library on the High Performance Storage System (HPSS). The preliminary results reveal that the optimizations can improve response times by up to 97.2%